add PCI device interface and a vfio backend driver #487

liva · 2020-03-17T07:32:21Z

LKL is now able to manage PCI devices by itself like DPDK!
This commit implements a PCI bus driver and PCI device interface. The commit also includes a vfio-pci backend driver as a reference implementation, thus no extra kernel modules is needed if the PCI device is assigned to VFIO.

I believe the vfio backend driver fulfills the need of most cases, but users can inject wrapper functions for another userspace PCI framework such as uio-pci-generic or a handmade kernel module. In either case, the framework should provide physically contiguous memory for DMA, because the kernel and some drivers (e.g. NVMe) assume its memory as physically contiguous.

This change is

tavip · 2020-03-19T17:58:06Z

Thanks for the PR!

I am seeing some failures on the x86_64 tests:

+ LKL_HIJACK_CONFIG_FILE=/tmp/tmp.6kZDin3Te8/hijack-test.conf /home/ubuntu/project/tools/lkl/bin/lkl-hijack.sh ip addr
ip: lib/posix-host.c:302: panic: Assertion `0' failed.
/home/ubuntu/project/tools/lkl/bin/lkl-hijack.sh: line 22: 23849 Aborted                 (core dumped) LD_PRELOAD=liblkl-hijack.so $*
  TEST       not ok   test_pipe_ping

as well as compile errors on mingw:

lib/iomem.c:3:22: fatal error: sys/mman.h: No such file or directory

It will be great if you could try running the tests on your environment to confirm if it is failing there as well.

I'll follow up in the next days with a more indepth review.

liva · 2020-03-21T15:55:17Z

Hi Octavian! Thanks for the comment!
And yes, it is no doubt that there are a lot of things to do: running tests, redesigning implementation, and so on.... I will take care of theses things until it becomes "ready for review".
I'm really glad to your support, and please tell me if I missed something.

thehajime

maybe change the title as uio is no more used ?

the reason to use vfio instead is nice to have in the PR comments.

thehajime

left several comments.

arch/lkl/drivers/pci.c

thehajime · 2020-07-20T09:41:37Z

arch/lkl/mm/bootmem.c

-	_memory_start = (unsigned long)lkl_ops->mem_alloc(mem_size);
+	if (lkl_ops->pci_ops && lkl_ops->pci_ops->alloc) {
+		_memory_start =
+			(unsigned long)lkl_ops->pci_ops->alloc(mem_size);


When pci_ops is installed but not used, do we need to use pci_ops->alloc() ?

Actually, pci_ops->alloc() just call mmap inside, so I updated the patch and introduced a new interface lkl_ops->page_alloc(), whose implementation is equivalent to that of pci_ops->alloc().

Now, PCI-related functions are removed from kernel memory allocation.
https://github.com/lkl/linux/pull/487/files#diff-65cbb8e45dedadd081421ddc9002d12fR17

LGTM, asking to @tavip for more comments if any.

Could you please elaborate why is page_alloc needed? The only user I see is in bootmem_init and there we already make sure that we skip host memory until it is paged aligned.

Good point! In this scenario (with the vfio backend), we can just use memory allocated by malloc().
Meanwhile, in other cases, I believe lkl should provide alternative memory allocation interface other than host_ops->alloc() for PCI DMA. For example, the architecture without IOMMU requires physically contiguous pages for bootmem, which is impossible to allocate with malloc(). Of course developers can override host_ops->alloc() for this, but it turns out inefficient memory consuming because host_ops->alloc() always has to return pages. This is why I proposed pci_ops->alloc() at the first commit.

However, your comment made me realize that the name "page_alloc" leads misunderstanding easily. What we need here is not a page allocation, but allocating memory suited for the PCI backend driver. I'm considering how I should fix and will apply that in a couple of days.

Got it, thanks for the detailed explanation. pci_ops->alloc() sounds good (maybe alter we can convert it into a host dma_alloc if there are other users).

One thing that is no clear to me, is who will use this memory? I don't think we need to allocate bootmem with it, perhaps just internal PCI buffers?

I totally agree that allocating the whole lkl memory with dma_alloc is not a sophisticated way.
However, linux components assume that the memory allocated by kmalloc() or other memory allocating functions is DMA-capable.

Documentation/DMA-API-HOWTO.txt says

If you acquired your memory via the page allocator
(i.e. __get_free_page*()) or the generic memory allocators
(i.e. kmalloc() or kmem_cache_alloc()) then you may DMA to/from
that memory using the addresses returned from those routines.

This makes things difficult, because it means that dma-capable memory allocators are often not called. (If it did, we could just hook dma_alloc() and return DMA-capable memory.)

There are some ways to solve this, but none of them are workable at modern device drivers such as NIC and NVMe.

The solutions I examined (and discarded) are

Using Zones to specify limited memory ranges for DMAs. This is for devices with limited address spaces, but does not work with the NVMe driver, which requires full address space.

bounce buffers. PCI drivers for some ARM SoCs implements this. Does not work as well, because the NVMe driver requires coherent memory and does not invoke DMA sync operation.

So, IMO, the only way to implement this PR is to make whole bootmem DMA-capable, even though this seems ugly....

tools/lkl/include/lkl.h

tools/lkl/lib/vfio_pci.c

tools/lkl/tests/vfio-pci.c

tools/lkl/tests/vfio-pci.sh

tools/lkl/tests/vfio-pci.c

tools/lkl/lib/vfio_pci.c

liva · 2020-08-07T08:50:27Z

I created another PR to lkl/lkl-docker, which provides QEMU and a disk image for the test of this feature.
In the commit ff9cd1b, they are generated in disk-vfio-pci.sh, which clearly slows down the overall test, so I improved this with a new docker image hosted on my account in the commit 666c357.

As soon as the PR to lkl/lkl-docker is merged, I will create a new PR (or add a commit to this PR) to replace the docker image.

tavip · 2020-08-19T00:15:05Z

LGTM. Could you also please rebase and remove the merge commit to keep the history clean. Thanks a lot for the hard work Shinichi!

With regard to page_alloc/DMA stuff I think that it is something we can look into later. A combination of DMA zone and dma_ops should work, but I don't know how the NVMe driver operates to formulate a concrete suggestion. My rationale for avoiding bootmem as DMA capable is to avoid confusion in the host ops since in some environments / LKL application no DMA is required.

LKL is now able to manage PCI devices by itself like DPDK! This commit implements a PCI bus driver and PCI device interface. The commit also includes a vfio-pci backend driver as a reference implementation, thus no extra kernel modules is needed if the PCI device is assigned to VFIO. I believe the vfio backend driver fulfills the need of most cases, but users can inject wrapper functions for another userspace PCI framework such as uio-pci-generic or a handmade kernel module. In either case, the framework should provide physically contiguous memory for DMA, because the kernel and some drivers (e.g. NVMe) assume its memory as physically contiguous. Signed-off-by: Shinichi Awamoto <shinichi.awamoto@gmail.com>

liva · 2020-08-27T02:16:10Z

Thanks for all your support. Commits are rebased and merged. Also, .circleci/config.yaml refers to lkldocker/circleci-qemu-x86_64:v1.1.

I agree that we need further discussion for better memory management. Ideally, we can fix the NVMe driver to allocate its memory from DMA zone. This would enable us to seperate bootmem and DMA mem completely.

liva force-pushed the x86 branch 2 times, most recently from db86eb1 to 7c06e0b Compare March 17, 2020 08:16

liva force-pushed the x86 branch from 7c06e0b to 0a1e08f Compare July 2, 2020 11:59

thehajime marked this pull request as ready for review July 6, 2020 04:49

thehajime reviewed Jul 6, 2020

View reviewed changes

liva force-pushed the x86 branch from 0a1e08f to 550074b Compare July 8, 2020 08:11

liva changed the title ~~initial implementation of uio~~ add PCI device interface and a vfio backend driver Jul 10, 2020

liva force-pushed the x86 branch 2 times, most recently from 1e2fc33 to dcfd93e Compare July 20, 2020 08:20

thehajime reviewed Jul 21, 2020

View reviewed changes

liva force-pushed the x86 branch 2 times, most recently from 3c42fda to dd0f447 Compare July 30, 2020 08:45

thehajime approved these changes Aug 5, 2020

View reviewed changes

tools/lkl/lib/vfio_pci.c Show resolved Hide resolved

tools/lkl/lib/vfio_pci.c Show resolved Hide resolved

liva mentioned this pull request Aug 7, 2020

x86_64: add the latest QEMU and a disk image for the PCI test lkl/lkl-docker#6

Merged

liva force-pushed the x86 branch 2 times, most recently from fa70c05 to 862ef0d Compare August 17, 2020 04:20

tavip approved these changes Aug 19, 2020

View reviewed changes

liva force-pushed the x86 branch from 862ef0d to 5b23a74 Compare August 19, 2020 07:50

liva force-pushed the x86 branch from 5b23a74 to 96de6a9 Compare August 20, 2020 12:27

tavip merged commit 1844b34 into lkl:master Aug 28, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add PCI device interface and a vfio backend driver #487

add PCI device interface and a vfio backend driver #487

liva commented Mar 17, 2020 •

edited

tavip commented Mar 19, 2020

liva commented Mar 21, 2020

thehajime left a comment

thehajime left a comment

thehajime Jul 20, 2020

liva Jul 31, 2020

thehajime Aug 5, 2020

tavip Aug 7, 2020

liva Aug 7, 2020

tavip Aug 7, 2020

liva Aug 7, 2020

liva commented Aug 7, 2020

tavip commented Aug 19, 2020

liva commented Aug 27, 2020 •

edited

add PCI device interface and a vfio backend driver #487

add PCI device interface and a vfio backend driver #487

Conversation

liva commented Mar 17, 2020 • edited

tavip commented Mar 19, 2020

liva commented Mar 21, 2020

thehajime left a comment

Choose a reason for hiding this comment

thehajime left a comment

Choose a reason for hiding this comment

thehajime Jul 20, 2020

Choose a reason for hiding this comment

liva Jul 31, 2020

Choose a reason for hiding this comment

thehajime Aug 5, 2020

Choose a reason for hiding this comment

tavip Aug 7, 2020

Choose a reason for hiding this comment

liva Aug 7, 2020

Choose a reason for hiding this comment

tavip Aug 7, 2020

Choose a reason for hiding this comment

liva Aug 7, 2020

Choose a reason for hiding this comment

liva commented Aug 7, 2020

tavip commented Aug 19, 2020

liva commented Aug 27, 2020 • edited

liva commented Mar 17, 2020 •

edited

liva commented Aug 27, 2020 •

edited